design decision
Choreographing Trash Cans: On Speculative Futures of Weak Robots in Public Spaces
Axelsson, Minja, Sikau, Lea Luka
Michio Okada first conceptualised "weak robots", which have limited capabilities themselves, and are framed as objects or "social others" which people are invited to assist and take care of. In Okada's work, such robots are used to invite pro-social behaviour from people, such as encouraging them to pick up trash to assist a trash can robot (Okada (2022)). We conceptualise human-robot interaction (HRI) as a stage where weak robots-- designed to be "cute" and vulnerable--play the role of incidental actors that subvert the person engaging with them. Caudwell and Lacey (2020) argue that cuteness as a design choice for robots can encourage users to trust and form relationships with those robots, which introduces ambivalent power dynamics through the production of intimacy . In fact, cuteness can also be seen as a deceptive or "dark" pattern, due to the utilisation of cuteness to prompt affective responses which can be used to collect emotional data, as well as some degree of reduction of user agency (Lacey and Caudwell (2019)). The ability and affordances of cute and weak robots to influence user behaviour merits the discussion of their ethicality, which we do in this paper through design fiction. Unlike traditional HRI research, often confined to laboratory settings, our focus is on spontaneous, real-world interactions that transform everyday environments into sites of performative potential. We argue that the theatricality of these encounters is central to understanding their impact: the presence of a weak and/or cute robot, such as the trash can robot, developed by Okada and the Interaction and Communication Design Lab of the T oyohashi University of T echnology, acts as a disruptive interloper that introduces an observer's effect and, thus, affects the human interlocutors. First, we examine the concept of weak robots through the lens of performativity theory as well as concepts of machine (dys)function.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- North America > United States > Minnesota (0.04)
- North America > United States > Indiana (0.04)
Levels of Autonomy for AI Agents
Feng, K. J. Kevin, McDonald, David W., Zhang, Amy X.
Autonomy is a double-edged sword for AI agents, simultaneously unlocking transformative possibilities and serious risks. How can agent developers calibrate the appropriate levels of autonomy at which their agents should operate? We argue that an agent's level of autonomy can be treated as a deliberate design decision, separate from its capability and operational environment. In this work, we define five levels of escalating agent autonomy, characterized by the roles a user can take when interacting with an agent: operator, collaborator, consultant, approver, and observer. Within each level, we describe the ways by which a user can exert control over the agent and open questions for how to design the nature of user-agent interaction. We then highlight a potential application of our framework towards AI autonomy certificates to govern agent behavior in single- and multi-agent systems. We conclude by proposing early ideas for evaluating agents' autonomy. Our work aims to contribute meaningful, practical steps towards responsibly deployed and useful AI agents in the real world.
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Step-DAD: Semi-Amortized Policy-Based Bayesian Experimental Design
Hedman, Marcel, Ivanova, Desi R., Guan, Cong, Rainforth, Tom
We develop a semi-amortized, policy-based, approach to Bayesian experimental design (BED) called Stepwise Deep Adaptive Design (Step-DAD). Like existing, fully amortized, policy-based BED approaches, Step-DAD trains a design policy upfront before the experiment. However, rather than keeping this policy fixed, Step-DAD periodically updates it as data is gathered, refining it to the particular experimental instance. This test-time adaptation improves both the flexibility and the robustness of the design strategy compared with existing approaches. Empirically, Step-DAD consistently demonstrates superior decision-making and robustness compared with current state-of-the-art BED methods.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- North America > United States > Massachusetts (0.04)
- North America > Canada (0.04)
- Asia > Middle East > Jordan (0.04)
- Research Report (1.00)
- Instructional Material (0.93)
- Health & Medicine (1.00)
- Law (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
- (2 more...)
A General Approach of Automated Environment Design for Learning the Optimal Power Flow
Wolgast, Thomas, Nieße, Astrid
Reinforcement learning (RL) algorithms are increasingly used to solve the optimal power flow (OPF) problem. Yet, the question of how to design RL environments to maximize training performance remains unanswered, both for the OPF and the general case. We propose a general approach for automated RL environment design by utilizing multi-objective optimization. For that, we use the hyperparameter optimization (HPO) framework, which allows the reuse of existing HPO algorithms and methods. On five OPF benchmark problems, we demonstrate that our automated design approach consistently outperforms a manually created baseline environment design. Further, we use statistical analyses to determine which environment design decisions are especially important for performance, resulting in multiple novel insights on how RL-OPF environments should be designed. Finally, we discuss the risk of overfitting the environment to the utilized RL algorithm. To the best of our knowledge, this is the first general approach for automated RL environment design.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Europe > Netherlands > South Holland > Rotterdam (0.05)
- North America > United States > New York > New York County > New York City (0.04)
- (5 more...)
- Research Report > Experimental Study (0.68)
- Research Report > New Finding (0.46)
DRAFT-ing Architectural Design Decisions using LLMs
Dhar, Rudra, Kakran, Adyansh, Karan, Amey, Vaidhyanathan, Karthik, Varma, Vasudeva
Architectural Knowledge Management (AKM) is crucial for software development but remains challenging due to the lack of standardization and high manual effort. Architecture Decision Records (ADRs) provide a structured approach to capture Architecture Design Decisions (ADDs), but their adoption is limited due to the manual effort involved and insufficient tool support. Our previous work has shown that Large Language Models (LLMs) can assist in generating ADDs. However, simply prompting the LLM does not produce quality ADDs. Moreover, using third-party LLMs raises privacy concerns, while self-hosting them poses resource challenges. To this end, we experimented with different approaches like few-shot, retrieval-augmented generation (RAG) and fine-tuning to enhance LLM's ability to generate ADDs. Our results show that both techniques improve effectiveness. Building on this, we propose Domain Specific Retreival Augumented Few Shot Fine Tuninng, DRAFT, which combines the strengths of all these three approaches for more effective ADD generation. DRAFT operates in two phases: an offline phase that fine-tunes an LLM on generating ADDs augmented with retrieved examples and an online phase that generates ADDs by leveraging retrieved ADRs and the fine-tuned model. We evaluated DRAFT against existing approaches on a dataset of 4,911 ADRs and various LLMs and analyzed them using automated metrics and human evaluations. Results show DRAFT outperforms all other approaches in effectiveness while maintaining efficiency. Our findings indicate that DRAFT can aid architects in drafting ADDs while addressing privacy and resource constraints.
- North America > United States (0.04)
- Europe > Switzerland (0.04)
- Europe > Denmark > Capital Region > Copenhagen (0.04)
- Asia > Middle East > Jordan (0.04)
Fine-Tuning Vision-Language-Action Models: Optimizing Speed and Success
Kim, Moo Jin, Finn, Chelsea, Liang, Percy
Recent vision-language-action models (VLAs) build upon pretrained vision-language models and leverage diverse robot datasets to demonstrate strong task execution, language following ability, and semantic generalization. Despite these successes, VLAs struggle with novel robot setups and require fine-tuning to achieve good performance, yet how to most effectively fine-tune them is unclear given many possible strategies. In this work, we study key VLA adaptation design choices such as different action decoding schemes, action representations, and learning objectives for fine-tuning, using OpenVLA as our representative base model. Our empirical analysis informs an Optimized Fine-Tuning (OFT) recipe that integrates parallel decoding, action chunking, a continuous action representation, and a simple L1 regression-based learning objective to altogether improve inference efficiency, policy performance, and flexibility in the model's input-output specifications. We propose OpenVLA-OFT, an instantiation of this recipe, which sets a new state of the art on the LIBERO simulation benchmark, significantly boosting OpenVLA's average success rate across four task suites from 76.5% to 97.1% while increasing action generation throughput by 26$\times$. In real-world evaluations, our fine-tuning recipe enables OpenVLA to successfully execute dexterous, high-frequency control tasks on a bimanual ALOHA robot and outperform other VLAs ($\pi_0$ and RDT-1B) fine-tuned using their default recipes, as well as strong imitation learning policies trained from scratch (Diffusion Policy and ACT) by up to 15% (absolute) in average success rate. We release code for OFT and pretrained model checkpoints at https://openvla-oft.github.io/.
Reviews: SURGE: Surface Regularized Geometry Estimation from a Single Image
The paper proposes a method for recovering scene geometry from a single RGB image. This method uses a dense CRF with terms that enforce consistency between point-wise depth and normal estimates, using regularizers based on classification of planarity and presence of depth boundaries. Each of these estimates (depth, normal, planarity, edges) comes from a separate network proposed for each task in prior work. In addition to the geometry-terms in proposed DCRF-based model, the paper's contributions include using multiple passes through the depth and normal networks with dropout to derive'confidence' values of these metrics, and joint training to fine tune the depth and normal networks. While significantly engineered for its specific application domain, the paper does demonstrate a successful example of inference with a regularized objective, where different terms are predicted from trained neural networks.
Apollo: An Exploration of Video Understanding in Large Multimodal Models
Zohar, Orr, Wang, Xiaohan, Dubois, Yann, Mehta, Nikhil, Xiao, Tong, Hansen-Estruch, Philippe, Yu, Licheng, Wang, Xiaofang, Juefei-Xu, Felix, Zhang, Ning, Yeung-Levy, Serena, Xia, Xide
Despite the rapid advancements in language and image-language modeling (Hoffmann et al., 2022; Brown, 2020; Yang et al., 2024; Liu et al., 2024a; Alayrac et al., 2022; Laurençon et al., 2024a; OpenAI, 2024), the development of video Large Multimodal Models (video-LMMs) has not kept pace. Videos provide a rich, dynamic information source, capturing nuanced temporal and spatial features beyond the reach of static images. However, video-LMMs remain under-explored, hampered by unique challenges: notably higher computational demands and a broader, more complex design space compared to their image-based counterparts (Li et al., 2023a, 2025; Liu et al., 2024d; Li et al., 2024b; Xu et al., 2024a). Many fundamental questions about video-LMM design remain unanswered: How should videos be sampled? Which vision encoders yield optimal representations? What are the best practices for resampling video tokens? Early approaches primarily extended image-LMMs directly (Xu et al., 2024b; Kim et al., 2024; Wu, 2024; Zhang et al., 2024e) or with video-specific fine-tuning (Li et al., 2023a; Zhang et al., 2023; Maaz et al., 2023). Recent methods introduced diverse design choices, such as longer context windows (Zhang et al., 2024e), multi-modality mixing (Li et al., 2024a,c), agent workflows (Wang et al., 2024c), self-training (Zohar et al., 2024), and more. Despite these efforts, the impact of these design decisions on video-LMM performance is poorly understood.
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Asia > Middle East > Jordan (0.04)
Contrasting local and global modeling with machine learning and satellite data: A case study estimating tree canopy height in African savannas
Rolf, Esther, Gordon, Lucia, Tambe, Milind, Davies, Andrew
While advances in machine learning with satellite imagery (SatML) are facilitating environmental monitoring at a global scale, developing SatML models that are accurate and useful for local regions remains critical to understanding and acting on an ever-changing planet. As increasing attention and resources are being devoted to training SatML models with global data, it is important to understand when improvements in global models will make it easier to train or fine-tune models that are accurate in specific regions. To explore this question, we contrast local and global training paradigms for SatML through a case study of tree canopy height (TCH) mapping in the Karingani Game Reserve, Mozambique. We find that recent advances in global TCH mapping do not necessarily translate to better local modeling abilities in our study region. Specifically, small models trained only with locally-collected data outperform published global TCH maps, and even outperform globally pretrained models that we fine-tune using local data. Analyzing these results further, we identify specific points of conflict and synergy between local and global modeling paradigms that can inform future research toward aligning local and global performance objectives in geospatial machine learning.
- Africa > Mozambique (0.24)
- Africa > Sub-Saharan Africa (0.04)
- Africa > South Africa (0.04)
- (9 more...)
Visualizing Extensions of Argumentation Frameworks as Layered Graphs
Nöllenburg, Martin, Pirker, Christian, Rapberger, Anna, Woltran, Stefan, Wulms, Jules
The visualization of argumentation frameworks (AFs) is crucial for enabling a wide applicability of argumentative tools. However, their visualization is often considered only as an accompanying part of tools for computing semantics and standard graphical representations are used. We introduce a new visualization technique that draws an AF, together with an extension (as part of the input), as a 3-layer graph layout. Our technique supports the user to more easily explore the visualized AF, better understand extensions, and verify algorithms for computing semantics. To optimize the visual clarity and aesthetics of this layout, we propose to minimize edge crossings in our 3-layer drawing. We do so by an exact ILP-based approach, but also propose a fast heuristic pipeline. Via a quantitative evaluation, we show that the heuristic is feasible even for large instances, while producing at most twice as many crossings as an optimal drawing in most cases.
- Europe > Austria > Vienna (0.14)
- Europe > Netherlands > North Brabant > Eindhoven (0.04)
- Europe > Italy > Umbria > Perugia Province > Perugia (0.04)
- Europe > Greece (0.04)